



## UNIVERSITY

**STUDENT ID NO**

|  |  |  |  |  |   |   |   |   |   |
|--|--|--|--|--|---|---|---|---|---|
|  |  |  |  |  | 4 | 4 | 1 | 0 | 0 |
|--|--|--|--|--|---|---|---|---|---|

# MULTIMEDIA UNIVERSITY

## FINAL EXAMINATION

**TRIMESTER 2, 2018/2019 SESSION**

**ECE 3226 - ADVANCED COMPUTER ARCHITECTURE AND  
PARALLEL COMPUTING**  
(CE)

1 MAR 2019  
09:00 A.M – 11:00 A.M  
(2 Hours)

## INSTRUCTIONS TO STUDENT

1. This question paper consists of five (5) pages only (including this page).
2. There are **FOUR (4) QUESTIONS** in this paper. Answer **ALL QUESTIONS**. All questions carry 25 marks each.
3. Write your answers in the Answer Booklet provided.

**Question 1**

a) Flynn's taxonomy is a specific classification of parallel computer architectures that are based on the number of concurrent instructions and data streams available in the architecture. Illustrate the FOUR categories of Flynn's taxonomy.

[4 marks]

b) In parallel computing, processing jobs are broken into discrete parts that can be executed concurrently. Each part is further broken down into a series of instructions to be able to be executed simultaneously. As a computer architect, explain the differences in the characteristics between MIMD and SIMD computer architecture with diagrams.

[8 marks]

c) A particular program runs in 5 seconds on computer A, which has a 8 GHz clock. We are trying to help a computer engineer to build a computer, B, which will run this program in 3 seconds. The engineer has determined that a substantial increase in the clock rate is possible, but this increase will affect the rest of PU design, causing computer B to require 1.2 times as many clock cycles as computer A for this program.

- i. Estimate number of CPU clock cycles for computer A

[5 marks]

- ii. Estimate the clock rate for computer B

[5 marks]

d) In Direct Memory Access (DMA), an input/output (I/O) device is allowed to send or receive data directly to or from the main memory, bypassing the CPU to speed up memory operations. In your words, specify three types of DMA transfer techniques.

[3 marks]

Continued .....

**Question 2**

a) To provide more modularity with reduced cost, an easy way to implement symmetric multiprocessing is to plug in more than one CPU into the shared system bus. Briefly describe and discuss the TWO advantages and disadvantages of Shared Bus architecture for symmetric multiprocessing.

[6 marks]

b) There are 3 types of data hazards, which can be classified based on the order of read and write accesses in instructions. Name the three types of data hazard and illustrate them by consider two instructions, A and B. A occurs before B.



[9 marks]

c) Pipelining is a technique where multiple instructions are overlapped during execution. Pipeline is divided into stages and these stages are connected with one another to form a pipe like structure. Instructions enter from one end and exit from another end. Pipelines can be divided into two classes. Illustrate and explain these two classes.

[5 marks]

d) For pipeline conflicts, there are some factors that can cause the pipeline to deviate its normal performance. Briefly describe Five factors that can cause this deviation.

[5 marks]

Continued .....

**Question 3**

a) Define the term of initiation rate, convoy, chime and vector start- up time.  
[8 marks]

b) Shared Bus and Crossbar Switch are the common interconnections networks for Multiprocessor System (MPS). Briefly compare and list out TWO advantages and TWO disadvantages for each of the Shared Bus and Crossbar Switch.  
[8 marks]

c) Given a bank busy time takes 6 clock cycles and a total memory latency is estimated to be 12 cycles, if we are given 8 memory banks:

- Calculate the number of clock cycles required to complete a 64 element vector load with a stride of 1.  
[3 marks]
- Estimate the number of clock cycles required to complete a 64 element vector load with a stride of 32  
[3 marks]

d) Show the following code sequence lays out in convoys, assuming a single copy of each vector functional unit:

|         |          |                         |
|---------|----------|-------------------------|
| LV      | V1,Rx    | ;load vector X          |
| MULVS.D | V2,V1,F0 | ;vector-scalar multiply |
| LV      | V3,Ry    | ;load vector Y          |
| ADDVV.D | V4,V2,V3 | ;add two vectors        |
| SV      | V4,Ry    | ;store the sum          |

[3 marks]

**Continued .....**

**Question 4**

a) Dataflow architecture is a computer architecture that directly contrasts the traditional von Neumann architecture. Briefly describe what is a Data Flow Graph and the definition of node in the data flow graph.

[4 marks]

b) Very long instruction word (VLIW) describes a computer processing architecture in which a language compiler or pre-processor breaks program instruction down into basic operations that can be performed by the processor in parallel. VLIW machines behave much like superscalar machines with THREE differences. List down the differences.

[6 marks]

c) Briefly describe in your own words the definition of static and dynamic dataflow machine.

[2 marks]

d) Construct a data-flow graph for the following nested if-then-else statement.

```
if x > y then
    if k = i then
        z = a + b
    else
        z = c * d
    else
        z = e + f
    endif
```

[13 marks]

End of Paper